27 research outputs found

    Classification with Costly Features using Deep Reinforcement Learning

    Full text link
    We study a classification problem where each feature can be acquired for a cost and the goal is to optimize a trade-off between the expected classification error and the feature cost. We revisit a former approach that has framed the problem as a sequential decision-making problem and solved it by Q-learning with a linear approximation, where individual actions are either requests for feature values or terminate the episode by providing a classification decision. On a set of eight problems, we demonstrate that by replacing the linear approximation with neural networks the approach becomes comparable to the state-of-the-art algorithms developed specifically for this problem. The approach is flexible, as it can be improved with any new reinforcement learning enhancement, it allows inclusion of pre-trained high-performance classifier, and unlike prior art, its performance is robust across all evaluated datasets.Comment: AAAI 201

    Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games

    Get PDF
    Hannan consistency, or no external regret, is a~key concept for learning in games. An action selection algorithm is Hannan consistent (HC) if its performance is eventually as good as selecting the~best fixed action in hindsight. If both players in a~zero-sum normal form game use a~Hannan consistent algorithm, their average behavior converges to a~Nash equilibrium (NE) of the~game. A similar result is known about extensive form games, but the~played strategies need to be Hannan consistent with respect to the~counterfactual values, which are often difficult to obtain. We study zero-sum extensive form games with simultaneous moves, but otherwise perfect information. These games generalize normal form games and they are a special case of extensive form games. We study whether applying HC algorithms in each decision point of these games directly to the~observed payoffs leads to convergence to a~Nash equilibrium. This learning process corresponds to a~class of Monte Carlo Tree Search algorithms, which are popular for playing simultaneous-move games but do not have any known performance guarantees. We show that using HC algorithms directly on the~observed payoffs is not sufficient to guarantee the~convergence. With an~additional averaging over joint actions, the~convergence is guaranteed, but empirically slower. We further define an~additional property of HC algorithms, which is sufficient to guarantee the~convergence without the~averaging and we empirically show that commonly used HC algorithms have this property.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0014

    Symbolic Relational Deep Reinforcement Learning based on Graph Neural Networks

    Full text link
    We focus on reinforcement learning (RL) in relational problems that are naturally defined in terms of objects, their relations, and manipulations. These problems are characterized by variable state and action spaces, and finding a fixed-length representation, required by most existing RL methods, is difficult, if not impossible. We present a deep RL framework based on graph neural networks and auto-regressive policy decomposition that naturally works with these problems and is completely domain-independent. We demonstrate the framework in three very distinct domains and we report the method's competitive performance and impressive zero-shot generalization over different problem sizes. In goal-oriented BlockWorld, we demonstrate multi-parameter actions with pre-conditions. In SysAdmin, we show how to select multiple objects simultaneously. In the classical planning domain of Sokoban, the method trained exclusively on 10x10 problems with three boxes solves 89% of 15x15 problems with five boxes.Comment: RL4RealLife @ ICML2021; code available at https://github.com/jaromiru/sr-dr

    NASimEmu: Network Attack Simulator & Emulator for Training Agents Generalizing to Novel Scenarios

    Full text link
    Current frameworks for training offensive penetration testing agents with deep reinforcement learning struggle to produce agents that perform well in real-world scenarios, due to the reality gap in simulation-based frameworks and the lack of scalability in emulation-based frameworks. Additionally, existing frameworks often use an unrealistic metric that measures the agents' performance on the training data. NASimEmu, a new framework introduced in this paper, addresses these issues by providing both a simulator and an emulator with a shared interface. This approach allows agents to be trained in simulation and deployed in the emulator, thus verifying the realism of the used abstraction. Our framework promotes the development of general agents that can transfer to novel scenarios unseen during their training. For the simulation part, we adopt an existing simulator NASim and enhance its realism. The emulator is implemented with industry-level tools, such as Vagrant, VirtualBox, and Metasploit. Experiments demonstrate that a simulation-trained agent can be deployed in emulation, and we show how to use the framework to train a general agent that transfers into novel, structurally different scenarios. NASimEmu is available as open-source.Comment: NASimEmu is available at https://github.com/jaromiru/NASimEmu and the baseline agents at https://github.com/jaromiru/NASimEmu-agent

    Hierarchical Multiple-Instance Data Classification with Costly Features

    Full text link
    We extend the framework of Classification with Costly Features (CwCF) that works with samples of fixed dimensions to trees of varying depth and breadth (similar to a JSON/XML file). In this setting, the sample is a tree - sets of sets of features. Individually for each sample, the task is to sequentially select informative features that help the classification. Each feature has a real-valued cost, and the objective is to maximize accuracy while minimizing the total cost. The process is modeled as an MDP where the states represent the acquired features, and the actions select unknown features. We present a specialized neural network architecture trained through deep reinforcement learning that naturally fits the data and directly selects features in the tree. We demonstrate our method in seven datasets and compare it to two baselines.Comment: RL4RealLife @ ICML2021; code available at https://github.com/jaromiru/rcwc
    corecore